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*o equate new forjs of the Preliminary Scholastic Aptitude 
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used for the frequency estimation approach. The most notable aspect 
of the results obtained from the comparison of the four methods was, 
the marked agreement found among them. The results also indicated 
that it is feasible to use IBT methods to equate the two forms of the 
PS AT/HMSQT directly. (BV) 4 / 
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In spite of extensive efforts on the part of. test development personnel 
to ensure that multiple forms of a test are similar in. content and difficulty, 
form to form differenc/s tend to occur with regular frequency. This situation 
retires that some adjustment be made to the scores on different forms of the 
test before tes^t results can be . interpreted^ in a meaningful way. The extent 
to which such adjustments are free from statistical bias clearly affects "the 
extent to which later substantive interpretations of test scores are bias 
free. Jhus, to ensure fairness to examinees taking different forms of *f test 
and competing for the same positions, accuracy in this process,' hereafter 
referred to as equating , is essential. 

The current thrust of research devoted to the practical applications of 
item response theory (IRT) has generated an active interest in score equating. 
While this interest is anything but new, it is one which calls attention to 
the underlying assumptions of the equating methods used by many large scale 
l teating programs. In an effort to understand afore about the effects of 
equating on the integrity of score scales, this study assesses the relative 

i 

agreement of four methods: (1) linear; (^) equipercentile; (3) frequency 
estimation equipercentile; and, (4) IRT estimated true formula score equating. 



. In addition, a unique application of IRT methods is presented which demonstrates 
their flexibility in solving equating problems not amenable to traditional 
methods. The data used for the study came fronhtwo recent administrations of 
the* Preliminary Scholastic Aptitude^esjt/^tional Merit Scholarship Qualifying 
Test (PSAT/NMSQT) , a test which is developed and administered by the College \ 
Board Admissions Testing Program. 

/ 

BACKGROUND AND PURPOSE OF THE STUDY 
A variety of definitions of equated scores have appeared in the literature, 
the most general and perhaps restrictive beyig that of Lord (1977), in whicfi 
he argues that "...transformed scores, y*, and raw scores, x, can be called 
equated, if and only if it is a matter of indifference to each examinee whether 
he is, to take test X or tfest Y. M In principle, Lord's definition subsumes 
equating of both non-parallel and parallel forms; bat, as he explains, one 
would not eyect^ these* requirements to be met unless strictly parallel* forms' 
were being used: \ This is because tests (forms) that are not strictly parallel 
will differ in leveL of difficulty. Forms that differ in difficulty cannot, 
•because of their true score relationship, be equally reliable. It is certainly 
not a matter of indifference to an examinee, particularly a high ability 
examinee, whether he/she takes one form of a test that is less reliable than a 
second form. A somewhat relaxed way of characterizing the notion of equivalent 
scores (Angoff, 1971) is to say that scores on two test forms may be considered 
equivalent if they have identical frequency -distributions for some population ' 
of examinees. 

Whatever 'definition of equivalent scores is adopted, two considerations 
are relevant to obtaining them, * design for data collection and a statistical 
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\ ■ model for transf&ming raw scores Co a coianon scale: Angoff ^1971) provides a 
^ comprehensive review of equating designs -and their cotipomitant assumptions" and 
'^traiwformation procedures. Designs range fz;om eMe simple single group (one 
group, two test forms), to th^ random groups (two randomly equivalent groups, 
two test forms), to the more complicated anchor test (two not necessarily 
equivalent groups, two test forms and one "anchor" test of common items taken 
by, bfeth groups). The design used to equate the PSAT/NMSQT' a complex 
version of. the basic anchor test design. 

Standard practice in equating new forms of the PSAT/NMSQT is' to equate 
each new form of the test to two old forms of the Scholastic Aptitude Test 
(SAT) through separate sets of common items. One can imagine each of the two 
new PSAT/NMSQT tforms produced annually as being composed of three sets of 
itenta: (1) items unique to t*^ fgrm; (2) items in common with one old SAT 
form; and (3) items in common with a .second old SAT form. It 'is important to 
note bhit both new forms (Form 1" and Foot 2) of the PSAT/NMSQT share items in 
common with the same two old SAT fcipns. However, there exists no item 
overlap between the two new forms, i/e., each new form is equated back, to the 
sane two old SAT forms but through different sets of common items. - 

Final scaled scores are determined separately for each new form as 
• follows: (1) the results of the PSAT/NMSQT Form 1 linear equating, to the 
,f. irst SAT old form the results obtained from the PSAT/NMSQT Form 1 linear 
equating to the second SAT old form are bisected, if the new 'to old forms 
relationships are judged to be linear; (2) the results pf the PSAT/NMSQT Form 

1 equipercentiie or #equency estimation equipercentile equating to the first 
* ■ 

old SAT form and the results of the PSAf/NMSQT Form 1 equipercentile or 
frequency estimation equipercentile equating to the second old SAT form are 
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averaged, if the^*ew to old form9 relationships are judged to be curvilinear. 

This process is repeated foAthe PSAT/NMSQT Form 2 equating. It should be 

' J . * 

% ' noted that the two PSAT/NMSQT new forms are. related (equated to each other) 

only thrpugh their relationship to the same two old SAT forms. It is not 

possible? to equate the new forms directly by traditional methods because they 

contain no common* items and are given to non-randomly equivalent groups. 

Constraints imposed by the PSAT/NMSQT data collection design present • 

several potential (prob^e^s^ equating process. First, several not 

necessarily equivalent groups, are represented in the design. . The two P^SAT/NMSQT 

( equating samples (selected from the Form 1 andV^rm 2, populations) are potentially • 

non-randomly* equivalent because of self selection at testing date. Moreover, '< " 

the two SAT equating samples (selected from the first and. second old form 

populations) are non-randomly equivalent .with respect to the PSAT/NMSQT groups 

to the extent that they differ in level of ability/ A second ^c^tential 

1 I - " < problem stems f^ora differences between the PSAT/NhfSQT and SAT in length,- 

reliability, and leyel of .difficulty. 

One might reasonably expect IRT methods to offer several advantages over 

traditional methods, at least as far as the PSAT/NMSQT design is concerned. 

First of all, according to Lord (1975), "In theory' ICC (IRT) methods are 

capable of estimating the equipercentile line of relatipn between raw "scores 

when two tests to be equated are not parallel, are given to non-equivalent 

groups, and everyone takes an anchor"test. Strictly speaking, no other method 

known to the writer can accomplish -this . " Second, as -explained in detail at a 

later point in this paper, it is possible to employ IRT methods to equate new 

forms of the PSAT/NMSQT directly to one another even through tftey contain no , 

— r i 

i . 

* common items and are given to non-randomly equivalent groups, 

' > ' ■ 6 . 

eric : - 
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The purpose of this study, therefore, is twofold: (l\ to compare the 
•results of lineaf, equipercentile, frequency estimation equipercentile and IRT 
true formula score equating Under the constraints of the PSAT/NMSQT design; 
and* (2) to investigate the feasibility of using IRT methods to equate. new - 
forms of th£ PSAT/NMSQT to each other directly. Resu.lts .from the first part 
of the study will provide some indication of the relative agreement of the 
four methods, whereas those of the second will illustrate the flexibility of 
IRT approaches in solving a .heretofore intractable testing problem. 

RELATED RESEARCH 

A number of researchers have recently investigated the relative performance 
of score equating. procedures applied to different equating designs in horizontal 
and vertical equating situations. .While it is fair to say that, on a very' 
general level, a certain degree of consensus, exists as to which procedures 
yield the most accurate results, the differences between t\ie findings of these 
studies, particularl/ those related to the stability of results, is a cause 
• for. concern. Slinde and Linn (1977, 1178,-1979) investigated in an indirect 
fashion the problem of vertical. equating 'of two forms ^designed for populations 
at different levels of ability. Their results suggested that linear, equiper- 
centile and IRT equating employing the| one-parameter logistic model may have 
limitations for the/process of vertical equating. This was especially true 
when the differences between test difficulty and between ability levels of , 
equating samples were most pronounced. Their studies imply that *an IRT 
approach based on the more complex three-parameter logistic model might 
provide more useful results for\/ertical equating situations. 
^ Marco, Petersen, and Stewart (19J9) presented perhaps the most compre- 
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hensive empirical study of equating techniques yet to appear. For designs 
similar to the PSAT/NMSQT design, they found problems with traditional methods' 
similar to those found in the Slinde and Linn studies. In particular, when 1 

- tests differing in difficulty were given to -non-randomly , equivalent groups and 
equated using an anchor test design, traditional procedures ^appeared to break 
down. In spite of the presence of possible criterion bias confounding some of 
their results, the authors* suggested that the three-parameter logistic model ~ ' ^ 
would yield the most acceptable results under unusual or extreme design 
constraints. However, Marco et al found, as did Slinde and Linn, that the 
degree of dissimilarity betweery groups and test forms were both relevant. 
When these factors were moderate, traditional methods, both linear and equi- 
percentile, ^yielded adequate equatings, 

A comparison of the stability of results obtained from traditional and 
IRT procedures was made by Kolen (1981), who used a cross-validation- group to^ 
establish a criterion 'for tbe evaluation of seven IRT methods and two traditional 
methods (linear and equipercentile) . Kolen had some difficulty evaluating the *' 
results obtained from application of the threes-parameter logistic model to 
equate new Level I* tests (vocabulary and quantitative thinking tests admin- 

. istered to 9th and 10th graders) and new Level II tests (tests of the same 
skills administered to 11th and 12th graders) to old tests of vocabulary and 
quantitative thinking that consisted of one level, administered to grades 
9-12. He found that' "Although the three-parameter estimated observed score 
method tended to produce the most stable cross-validation results at Level I 
of the t§sts, the results were of only moderate accuracy'St Level II. The 
three-parameter estimated tru* score equivalents metho'd tended to produce the 

'moat .stable cross-validation results at Level II but results of moderate 
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results for both Che verbal and mathematical sections. It should be noted 

that the- study involved tests similar in level of difficulty which were given 

to groups of examinees that did- not differ greatly ip their level of ability; 

c 

a situation in which-one would expect tradi tional 4 litfear methods to work 
well. 

If anything, an in-depth look at previous research comparing various- k 
equating procedures leaves the practitioner a little bit bothered. On the one 
tt*nd, IRT approaches, especially those using the three-parameter logistic 
mo<del, appear to provide the mqst accurate results and hence, seem appropriate 
frbm an empirical perspective as well as a, theoretical one. *0n the other ^ 



hand, there is 



some question regarding their stability, although the corapara 



tively small amount of scale drift associated with the* IRT concurrent cali- 
bration design found b[y Petersen et al (1981) is evidence in support of their 
application to parallel forms of aptitude tests administered to groups that 
are similar in ability. In addition, it is important to note that the studies 
reviewed indicate that at present the effects of differential reliability and 
difficulty of test forms and the effect of the non-randomness of examinee 
samples do not appear to be completely understood. 

DATA COLLECTION AND METHOD 
The data for this study came from two recent administrations of the 
PSAT/NMSQT, 'a test which is developed and administered by the College Board 
Admissions Toting Program. Also used were data from two forms of the SAT, 
developed and administered by the same organization. Both the PSAT/NMSQT and 
SAT are multiple choice tests. The tests differ in length and difficulty, the 
PSAT/NMSQT 'being composed of 65 verbal and 50 mathematical items and the SAT 
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results for both the verbal and mathematical section*. It should be noted 

that the- study involved tests similar in level of difficulty which were given 

to groups of examinees that did- not differ greatly ip their level of ability; 

c 

a situation in which- one would expect traditional^litfear methods to work 

* ¥ * 

well. 

If anything, an in-depth look at previous research comparing various • fc 
equating procedures leaves the practitioner a little bit bothered. On the one 
Wid, IRT approaches, especially those using the three-parameter logistic 
model, appear to provide the most accurate 'results and hence, seem appropriate 
from an empirical perspective as well as a, theoretical one. *0n the other ) 
• hand, there is some question regarding their stability., although the compara- 
tively small amount of scale drift associated with the* IRT concurrent cali- 
bration design found b[y Petersen et al (1981) is evidence in support of their 
application to parallel forms of aptitude tests administered to groups that 
are similar in ability. In addition, it is important to note that the studies 
reviewed indicate that at present the effects of differential reliability and 
difficulty of test forms and the effect of the non-randomness of examinee 

« 

samples do not appear to be completely understood. 

• i ♦ 

DATA COLLECTION AND METHOD 
The data for this study came from two recent administrations of the 
PSAT/NMSQT, 'a test which is developed and administered by the College Board 
Admissions Toting Program. Also used were data from two forms of the £AT> 
developed and administered by the same, organization. Both the PSAT/NMSQT and 
SAT are multiple choice testa. The tests differ in length and difficulty, the 
PSAT/NMSQT being composed of 65 verbal and 50 mathematical items and the SAT 
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of 85 verbal. -and 60 mathematical items. The PSAf/NMSQT consist* of .two . 
50-minute sections. .The verbal section contains only 5-choice items'; j:he 
mathematical section contains a mixture of 4- and 5-choice items. Raw scores * 
obtained on the PSAT/NMSQT are most typically , transformed to scaled scores on 

the College Board 200 to 800 scale via the linear equaling jnethod described on 

♦ ** ' 

page 3. For score reporting purposes, the final digit of .the Score is dropped 

« * 

and scores are reported on a 20 to 80 scale. PSAT/NMSQT raw scores are 
actually formula scores generated from number right scores using a correction 
for guessing formula. Raw scores are computed by the formula R - kWj where' R 

is the numher of correct ^responses, W is the number" of incorrect responses and 

1 * 

k f 1/n-l, n being the number of choices per iAm. Both the verbal an^ 

mathematical sections of the^^esb -were used for this study. 

The SAT consists o£ six JO-minute sections: two verbal sectipns, two ' 

mathematical sections, one Test of Standard Written English (TSWE) and cfne 

experimental section containing an equating test or pretest. The two verbal ^ 

sections, one mathematical section and the^ TSWE contain 5-choice items; the 

other mathematical section contains a mixture of 4- and 5- choice items. 

scores 'on the SAT are also typically transformed to scaled scores on 

College Board 200 to 800 scale by linear eqnating methods. This s&ale is 

retained for score reporting. SAT raw scores are formula scor.e^incorporating 

the correction for guessing procedure previously described,' Ofily the two 

verbal and two mathematical sections of the test wer,e us^d'for the study. 

f . ' 

^ Figure 1 illustrates .the equating design empl-tfyed for 'the, first part of . 
the study, which involves asse«#ing the relative agreement >fyRT ancf tradi tional 
methods. ' PSAT/NMSQT, Form 1 and Form 2 are alternate forms of -the PSAT/NMSQT, 
each containing a subset of ifems in common* with each of the SAT old forms v * 



FSAT/NMSQT 



Form 1 
Math 



SAT 



First Old Form 
Math Sections 




Second Old Form 
Math Sections 



Form, 2 
Verbal. 



^First*01d Torm 
Verbal Sections 



Form 2 
Verbal 



Second Old Form 
Verbal Sections 



/ 



Figure 1; Schematic Diagram of Design Used 
in Study "for Equating PSAT/NMSQT 
^Form 1 Math and 'Form 2 Verbal to^ 
SAT First and Second Old Forms. 
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(hereafter designated SAT First Old Form and SAT Second Old Form.) Only the 
mathematical section of ^PSAT/NMSQT Form 1- and the verbal section of PSAT/NMSQT 
Form 2 were examined for agreement across methods. ' 

Equating samples for fell methods, except the frequency ' estimation approach! 
contained approximately 2 ,000 randomly-selected cases from data obtained at 
th^ regular administrations of each of; the old*a^d'new fortes shown in Figure *1. 
A total of eight random s/mples, two each for PSAT/NMSQT Foot Land PSAT/NMSQT ' 
Form^, SAT First Old Form ari^AT Second Old Form, were used in the study. 
Since sample sizes of 2,000 are not likely -to yield stable estimates for the 
frequency estimation procedure, separate, larger samples (approximately 9,000 
'cases for eaph V PSAT/NMSQT sample -and 5,000 cases' for each SAT sample) were 
used for this approach. Tests. for differences between the frequency estimation 
equfting samples and those drawn for the other methods indicated that* ng v 
significant differences existed at the .05 level. 

As mentioned previously, four separate PSAT/NMSQT to SAT anchor test 
eqi^ings (two verbal and two mathematical) were repeated far each of the four * 
methods of interest; -linear, equipercentile, 'frequency estimation equipercentile, 
and IRT true formula score. Each of these methods is described in greater 
detail below. Appendix A provides additional information regarding conversion 
pr^edures. 

4 

< * 

The^basi* for the linear conversions under consideration is that' scores 
on two test forms are equivalent if they correspond 'to the same number of 
standard deviations from the mean in some group o£ examinees, the linear 
methods used were either the Tucker or Levine models (cf. Angpf f 1971) . 
Both of these models assume that scores Non the relevant selection attribute 
(tbe attribute on which ttje equating samples vary) are collinear wi*h the 

.13 
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ac.ores on ^he anchor test. 

Each of the equipercentile models maintains that scores on two test, forms 

are equivalent if they correspond to the same percentile rank in some group of 

ex^ir\pes. The ordinary equipercentile procedure involves equating scores on 

each test form-to the anchor test separately within each group. Scores on the 

two forms to be equated, ar^ then said to be equivalent 'if they correspond to 

the same score on the anchor test. In contrast, the frequency esti^tion . 

equipercentile method estimates the frequency distributions of scores on the 

two forms of interest for a hypothetical combined group of examinees (students 

who took the. new form and students who took the old form). Again, scores on 

* 

the two forms are 'said to be equivalent if their corresponding percentile 
ranks are tfye same. 

'</ * • 

Finally, IRT equating models characterize equivalent scores on two test 

forms as those scores which correspond to the same 'estimated level of the 

latent trait, ability, or skill, underlying both tests. Item response theory 

assumes that ' a' mathematical function relates, the probability of a correct 

response on an' item to an examinee's ability (Lord,^1980)* As previously 

ioned, the mathematical function (IRT Ibdel) employed in this study was % 

thr^e-parameter^logistic model. The model states that'the probability of 

a correct response to item i (P (8)), is given by: 

. * a 1 (e-b 1 ) 

P i (e) - c i + (1 - C i ) baTTeito • 2 > »> * (1) 

l+e r 1 

• » » * 

where, a ± , b^ , and c are three parameters describing the item and 8 represents 
the ability level of an examinee. ' , 

The item parameters and examinee abilities for the study were estimated 
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using the program LOGIST (Wood and Lord, 1976; Wood et al , 1976). The estimates 

are obtained by a k (modif ied) maximum likelihood procedure which has been 

c 

adapted t'o 'accommodate omitted items (Lord , ; 1974) . 

Although a variety of equating techniques exist once an IRT model has 
befcn chosen, only estimated^ true formula dore equating (Lord, 1980, Chapter 
13) was used for this study. Estimated true formula scores £ and n on two 
tests measuring the same ability, 8, are related by the equations, 



(2) 



(3) 



where, A is the number of choices pec item, P^e), and Pj(9), represent phe 
probability of a correct response for items i and- j as they appear in the 
two forms to be equated and Q i (9), Q_.(6) equal 1 - V ± (*) and 1 - P j< 9 )> 
respectively. # Using expressions 2 and 3, it is possible to find an estimated 
true formula score S corresponding 50 an estimated true formula score n for 
any given 9 . 

Expressions 2 and 3 will not provide equated estimated true formula 
score* for scores on^the two* test forms of interest that fall below the chance 
seore level. Several ways exist for determining the relationship in this 
region. Kolen (1981) used linear interpo latioh^^fhe method that was used for 
this study involved estimating the mean and standard deviation of scores below 
the chance score level for the two forms of interest and using the estimated 
values to establish a linear relationship. 

The means and standard deviations of below chance score level scores were 



.15 
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estimated using ,the following expressions: 



x A-l % c i A-l 



A 
A-l 



x 

i-l 



x 

Z c 
i-l 



(4) 
(5) 



* where,. 



M 



the mean of PSAT/NMSQT scores below chance level, 



«2 ^ 



> x ■ the variance of PSAT/Ni^SQT scores below chance level, 
A ■ the number of choices per item, and 
c ■ the psuedo guessing parameter for item i . 



-Equation* 4 and 5 were -repeated to obtain M y and S y , the estimated 
' . mean and variance of *below chance level scores for the SAT old form of interest 5* 
linear parameters for equating PSAT/NMSQT scores below chance level to SAT 
scores §elow chance level were determined as follows: 



S 



(6) 



B - M -.AM 

y x 



, (7) 



< 
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The linear parameters (A and B) are used^to form the following expression: 

score (SAT) - A [score (PSAT/NMSQT) ] + B (?) 

•■ ■' . ' ) ■ . 

The first part of the study involved the comparison of conventional 
1 linear and curvilinear methods with' the IRT method. The item calibration plan 
for this part of the study is illustrated 'in Figure 2. Each of the four 

• 16 




Flrsc Equating 



Group 


PSAT/IWSQT Form 1 
Math Items 
n-31 


'Common 
Items 
n-19 


SAT First Old Form 
Math Items 
n-41 


PSAT/NMSQT 


X 


X 


Nop 
Reached 


SAT 


Not 
Reached 


X 


X 


Second Equating 


Group 


PSAT/IWSQT Form 1 
Math Items 
' n-30 


Common 
Items 
n-20 


SAT Second Old Form 
Math Items 
n-40 


PSAT/NHSQT k 


X 


X 


Not 
Reached 


SAT 


Not 
Reached 


X 


X 


> 

m. 

Third Equating 


Group 

' 


PSAT/NMSQT Form 2 
Verbal Items 
n-42 '* 


n 

Common 

Items 

'n-23 


SAT First Old Form 
Verbal Items 
n-62 


PSAT/NMSQr 


X 


. X' 


7 

Not 
Reached 


SAT 


Not 
Reached/ 


X 


X 


^ • Pourth Equating 


Group 


f PSAT/NMSQT Form 2 
Verbal Items 
n-42 


Common 
Items 

. n-23 


SAT Second Old Form 
Verbal Items 
n-Gi; 


PSAT/NMSQT 
^ 


X 


X 


* Not ' 
^ Reached 


SAT 


Not 
Reached 


X 


-* X 



.Figure 2: Calibration .Plan for IRT Equating* Used for Compari- 
son with Conventional Equatinge 



Each of the four boxes Indic&tss a separate calibration run. Both 
new and old form samplee contained 2000 caeea. Crosses indlcsts 
items that examinee groupe actually were exposed to. 
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separate boxes represents ,a single LOGIST run v yielding item and ability 1 
parameters on a common scale. Data for the separate runs are arranged such 
-that each PSAT/NMSQT and SAT group is considered to have taken exactly thfe 
same test. For example, considering the first box in Figure 2, .both groups 
are conceptualized as having taken, a tejt composed of PSAT/NMSQT Form 1 Math 
items, common items/And SAT First Old Form Math items. Examinees are con- 
sidered to simply not have reached thos^e items to which 'they were not exposed. 
| Ability estimates are thus based od a subset of "total" test items actually 

answered. The design permits true formula score equating of each PSAT/NMSQT - 
.SAT pairing illustrated in Figure 2. 

The calibration plan for the second part of the study, -the direct equating 
of PSAT/NMSQT Form 1 Verbal scores to the PSAT/NMSQT Form 2 Verbal scores i<s 
shown in Figure 3; The entire matrix illustrated in Figure 3 represents a 
single LOGIST run. As before, each of. the four groups is considered to have 
taken exactly the same test. This test is conceptualized as containing the 



eight components designated by the column headings in Figure 3. This plan 
permits direct equating of the PSAT/NMSQT Form 1 Verbal scores to the PSAT/NMSQT 
Form li Verbal scores even^though the* two sections contain no overlapping 
item/. It also permits equating each of the PSAT/NMSQT 'Verbal scores separately 
to each of the SAT Verbal scores, thus allowing'replication of the equatings 
carried out for these scores in the first part of ^he study. This replication 
was attempted only for. the PSAT/NMSQT Form 2 equating to the SAT Second Old 
Form. v * > * 

Two techniques were used to evaluate the results of the various methods. 
First, graphical comparisons are presented to give an overview of the relative 
agreement of each tradifional method with the IRT method or methods. Second, 

_ ' 16 



Group 


PSAT/NMSQT 
Form 1 
Uniatip 
v Items 

n=20 


PSAT/NMSQT 
Form 1 - 

SAT EWr-cf 

Old Form 
♦ Common * 
Items t - 

na 99 


PSAT/NMSQT 
Form 1 - 

CAT C ~ ~ I 

oAi becond 
Old Form 
Common 
I terns 
n=z J 


PSAT/NMSQT < 
Form 2 
Unique 1 
I terns 

n=i9 


■ PSAI/NMSQT 
Form 2 - 
SAT First 
Old Form 
Common 
Items 
n=23 . ' 


PSAT/NMSQT 

Form 2 - 
SAT Second 
Old Form 
Common 
Items 
n-23 


SAT 
First 

Form 
Unique 
I terns 
n=40 


SAT 
Second 
Old 
Form 
Uniqae 
I terns *- 
n=39 1 


PS AT/ 
(JMSQT 
Form 1 


x 


I 

Y 
A 


v 

X 


Not 
Reached 


Not 
Reached 


Not 
Reached , 


Not 
Reached 


7 / 

Not 
Reached 


PSAT/ 
NMSQT 
Form 2 


Not 
Reached 


Not 
Reached 


Not 
Reached 


X 


X 


X 


Not 
Reached, 


Not 
Reached 


SAT 
First 
Old 
Form 


Not 
"Reached 


X 


Not 
Reached 


Not 
Reached 




Not 
Reached 


X 


Not 

Reached ^ 


SAT 

Second 

Old 

Form 


Not 
Reached 


Not 
Reached 


X 


Not 
Reached 


Hot . 

Reaped 


X 


Hot 

Reached 


X 



Figure 3: Calibration Plan for Direct IRT Equating of PSAT/NMSQT Form 1 Verbal Section to PSAT/NMSQT Form 2 
Verbal Section. The entire matrix' represents a single calibration run. Crosses indicate items 
that examinee groups were actually exposed to. Each PSAT/NMSQT and SAT sample contains approximately 



v, 




2,000 cases. 
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discrepancy indices (cf/ -Marco et al , 1979 ,and Appendix B) for the total score 
distribution and thre/ regions (upper 20, middle 60, and lower 20 percent of 
this distribution) /re provided as a numerical indication of differences 
across methods. The discrepancy index described by Marco et al (1979), is 
simply a weighted (weighted, by the frequency of the equating samples) mean- 
squared difference between an estimated score and a criterion score. Since 
this study is concerned with agreement with, rather than performance'against, 
a criterion, the discrepancy index is here better thought pf as an index of K 
agreement. It is thus a weighted mean-squared difference between scaled 
scores estimated by each of the traditional methods compared to those estimated 
by the IRT method . Retails for calculating the discrepancy index are given in 
Appendix- B. '~ '. - 

m 

% * 

resuVts 

The results of the first part of the study, which involved the comparison 

of IRT estimated true formula score equating. with three traditioijal methods, 

J| linear, equipercentile and frequency estimation equipercentile, for the four 

PSAT/NMSQT, SAT pairings are summarized in Tables 1-8 and Figures- 4-7 Raw 

e • • * 

score to scale score transformations for each equating meth^aw^ied to each 

PSAT/NMSQT.J3AT pairing are given in Tables. 1-4. The information contained in 

these .tables is also presented graphically in Figures 4-7. Each figure 

contains three plots comparing the traditional equating methods with the IRT 

method. Tables 5-8 contain summary data and discrepany indices computed' as a 

oeans for comparing the traditional equating methods with the IRT method. 

Each table contains data for a single PSAT/NMSQT, SAT pairing. 
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ExSnination of Che information. contained in Table 1 and illustrated in 
Figure 4, indicates close agreement *of all three traditional equating methods 
'with the TRT method for* the PSAT/NMSQT Fojm 1 Math, SAT First Old Form pairing. 
Tfie IRT method tended to yield slightly higher scaled scores than eitner thff 
linear or traditional curvilinear methods at the extremes of the score scale. 
The method that appears to agree most closely with the IRT method is the 
frequency estimation equipe'rcentile method. 

Table' 2 and Figure 5 contain information pertaining to the PSAT/NMSQT 
Form 1 Math, SAT Second Old Form pairing. Again, close agreement was found ' 
among the raw to scale conversions for all three traditional methods, compared 
to the IRT method. The IRT method tended to yield slightly lower scaled 
scores fhan any of \he three traditional methods. The procedure that appears 
to agree most cUsely with the IRT equating is again the frequency estimation 
equipercentile method, although the equipercentile agrees more closely for the 
upper and lower ends of the score range. 

TheVesults of. the PSAT/NMSQT Form 2 Verbal, SAT First Old Form equating- 
are presented in Table 3 and Figure 6. It can be seen, from examination of 
.thesa data, that the TRT eqiiatiqg method yielded higher scaled scores than the 
linear method, particularly Jt the extremes of theT score scale. The traditional 

* 

^method that agrees most closely with the IRT method appears to be the equipercentile 
Table 4 and Figure 7 contain the raw tb scale conversions resulting from 
application of the four equating methods to the PSAT/tmSQT Form 2 Verbal, SAT 
Second Old Form pairing. For this equating, the IRT method tended to yield 
scaled scores 'that agreed quite well with those obtained by the traditional 
methods, with the exception of linear conversions at the upper end of the 
score scale. The method that appears to agree most, closely with the IRT 
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229.86 


•'22 6.9 


232 .5 




12 


21 3. U 


- 217.54 


213.4 


223.7 


3 


# 3 


204. 50 


206.97 
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2 


2 


195. 3<* 


^ 193.20 
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1 


4 
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,192.00 


. 202.3 
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0 


3 


173.6*3 


185.79 
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137.2 


-1 
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179.63 
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1*77.5 


-2 
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177. 1 
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-3 


0 
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-4 


2 
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15 3. 83 
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145. 1 








* 







9 

ERIC 



'28 





*^ - 26 - * 

* Table 4 

I COXPAKISGN OF PAW SCC^E -TC SCALED SCORE TP ANS FORMAT I ONS 
, . ' * PSWNMSOT FOPM 2 VEP3AL TJ SAT SECOND OLD FOPM 



ESTIMATED SCALED SCOPE' 



ERIC 



'RAW 






- 




FREQ EST 


• 


SCORE 




LINEAR 




FOUI ? 


FOUI? 


' [PT 


63 


2 


73T.75 




749.85 


— — — — . 
779.4 


— — — — 

765.4 


62 


0 


779.93 




742.64 


773. 1 


753.1 


61 


- 1 


721. 10 




735.42 


753.4 


741 .3 


60 


' 1 


712.27 




728.21 


733. 9 » 


729.9 


59 


3 


703*45 




717.41 


714. l t 


713.9 


5P 


4 


694.62 




705.41 


f95.4 


708.1 


57 


I 


635.79 




693.41 


. 635.5 


697.5 


56 


5 


6 76.9 7* 




634.50 


679.0 


637.1 


'55 


2 


668. 14 




678.69 


671. 3 


. 676.9 


54 


7 • 


659. 12 




67?. 


660.8 


666.7 


52 \ 




650.40 




660.05 


^ ASH 


656.6 


5 


641 .66 




645.89 


6*4.3 


646.5 


51 


3 


637. 84 




632. 80 


636.7 


616.4 


50 




624. 10 




619.72 


62 3.3 


626.3 


49 


14 


615.13 




609. 19 


619.6 


616.2 


48 


15 


606. 36 




631.21 


609.0 


606.1 


47 


.7 


597.51 




593. 23 


600.6 


596.1, 


46 


1 ft * 


583.71 




584.99 


* 

5 r / 1 . 5 


536.2 


45 


32 


579. 83 




576.46 


582.1 * 


576.3 


44 


23 


571.05 




567.93 
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43 


41 


56?. ?3 




556.79 


5/S3.2 


557.0 


42 


13 


553.40 




545.52 


555.3 


547.5 


4 1 


26 


544. 5% 




536.11 


547.0 
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40 


35 


535. 75 
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523.9 


39 


* 33 
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513.31 
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41 


.5i^f. 1 0 


- 


51 1.07 


513.4 
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37 


23 
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36 


5% 


1^500.44 




496.37 
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35 


55 
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438.40 
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34 


55 


482 • 79 




430.42 
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475.7 


33 


70 
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472.24 


471. 3 


467.2 


32 


34 


465. 14 ' 




463.83 


464.7 


458 .7 


31 


70 


£56.31 


» 


455. 52 


458.5 


^50.3 


30 


67 


• 447*49 




447. 15 


44 3.9 


441 .3 






438.66 




438. 75 


439.2 


433 .4 


28 


73 


. 429.83 




430*35 


429. 8. 


425. 1 


27 


38 ' 


421.01 




421.42 
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416.7 


26 


35 


412. 18 




412.37 


414.3 


408.3 


25 


61 


403.36 




403.33 
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399.9 


-24 


75 


394.53 




394.29 


395.5 


391 .5 


23 


77 


. 385.70 




385.33 


386.2 


383.0 


22 


37 
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374.6 


21 


64 


369*. 05 
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20 


71 


359.22 




363.16 


362F8 . 


357.6 


I* 


74 


350.40 


29 
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18 


52 


341. 57 




343.26 


342.2 . 


340.7 


17 


23 


332.75 




334.60 


33*. 2 , 


332.3 


16 
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323.92 
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> m Table 4 (cont.) * 

A\ COMPARISON OF RAW SC00£ tq SCALED SCORE TP \ NSFCP f'AT IONS 

PSAT/\'MSOT FORM 2 VE»BAt TO SAT S5CONO OLO FCR M tCON'T) 



ESTIMATED SCALED SCOP 5 



RAW 

SCOPE FPEQ 
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12 20 
1 1 39 

13 26 
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Figure 4: Comparison of Raw Score to Scale Score Conversions obtained by Traditional 
and IRT Methods for PSAT/NMSQT Form 1 Math - SAT First Old Form Equatings. 
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Figure 5*-. Comparison of Raw Score to Scale Score Conversions obtained by-Traditional 
• and I*T Methods for PSAT/NMSQT Form 1 Math - SAT Second Old Form Equates. 
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Figure 6: Comparison of Raw Score to Scale Score Conversions obtained by Traditional 
and IRT Methods for PSAT/NMSQT Form 2 Verbal - SAT First Old Form Equatings. 





10 



20 



40 



50 



00 



30 

RAW SCORE 

PSAT/ft«OT FORM 2 VERBAL TO SAT SECOND OLD FORM 



000 








750 






At 


700* 




• 




$060 








a 000 

m ****** 








^550 








0500 








S 450 








C400 
0^ 








^300 








250 
200 




EOUI* 

XRT 




150 
\an 


till! 


1 I 1 1 1 1 1 


i i 



0 tO 20 . 30 40 50 00 

^ RAW SCORE * 

PSAT/NWSQT FORM 2 VERBAL TO SAT SECOND OLD FORM 




10 



20 30 40 
RAW SCORE 



SO 



FSAT/WtSOT FOR* 2 VfJWAL TO S*f StCO» OLD F<j 



I 
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Table 5 

Summary of Discrepancy Indices for Equating Methods 
PSAT/NMSQT Form 1 Mathematical .Section - SAT Fi*st Old Form 
Total Score Distribution and Three Subdivisions 



Equating Methods 







IRT 


Linear 


Equiperc entile 


Frequency 
Estimation 
Equipercentile 


Total Score 


Scaled Score 
Mean 

Scaled Scote 
Standard DeV^atigxL^ 


447.37 


448.61 
104.62 


• 448.52 
' 104.79 


446.43 
"105.26 








Total Score 
Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 


ft 


19.63 

> 

4.25 


19.77 
' 1.15 

1 4.30 \ 


l/.41 

-.94 
1 

3.81 


Upper 202 ' 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




28.50 
.91 

5.26 


8.27 ^ 
.80/ 


5.04 
-.13 

* 

t 

2.24 


Middle 60Z 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




13.36 
3.38 

1 

1.39 


16. r% 

3.40VV 

2.27 j 


7.36 

.52 „ 

2.66* . 


Lover 20Z 

of Distribution 


Total Error 

Bias • * 

Standard Deviation 
of Difference 




29.61 
-4.75 

2.65 


39.82 / 
-5.17/ 

3.62 


49.13 
-6.02 

3.59 

r 
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Table 6 



Summary of Discrepancy Indices for Equating Methods 
PSAT/NMSQT Form 1 Mathematical Section - SAT Second Old Form 
Total Score Distribution and Three Subdivisions 



\ 



Equating Methods 







IRT 


Linear , 


Equlpercentlle 


Estimation 
Equlpercentlle 


Total Score 


Scaled Score 
Mean 

w 

Scaled Score 
Standard deviation 


446.82 
103.41 


450.67 
106.47 


449.61 
104.93 


451.04 
105.25 


Total Score 
Distribution 


Total Error 
Bias 

Standard .Deviation 

of Difference 
< 




39. ) 
3.85 

5.01 


31.29 
2.80 

4.84 . 


30.^7 
4.22 

3.62 


Upper 20Z 
of Distribution* 
* 

^ 


Total Error 
Bias 

Standard Deviation 

of Difference 

h 




49.98 
5.42 

4.54 


" 7.20 - 
.91 

2.52 


39.17 
_ 4.89 

3.90 


Middle 60Z 

of Distribution 


£otal Error * 
Bias 

Standard Deviation 
of Difference 


% 

m 

1 


44.96 
5T74 

3.47 


42.46 
5.67 

3^ 


36tl7 
5.80 

* 4 


Lover 20Z 

of Distribution 


Total Errpr 
Bias 

Standard Deviation 
of Difference 


• 


14.32 
-3.54 

1.33 


20.61 
-4.24 

« 

1.63 


6.74 
-1.33* 

2.23 
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Summary of Discrepancy Indices for Equating Methods % 
PSAT/NMSQT Form 2 Verbal Section - SAT First Old Form 
Total Score Distribution* and Three Subdivisions * 



Equating Methi 






. IRT 


Linear 


Equipercentile 


Frequency 
/ Estimation 
•Equipercentile 


Total Score 


■tealed Score 
Ban 

Scaled Score 
Standard Deviation 


414.98 
100.25 


410.77 
101.05 


, 410.31 
99.64 


"\ 410.12 
. 101.21 


Total Score 
Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




42.04 
-4.21 

4.93 


" 29.46 ' 
-4.67^ 

2.77 


32.75 
-4.86 

3.02 


upper ioz 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




89.37 
-6.37 

6.99 


55.00 
-6.86 

2.82 


27.88 
-4.61 

2.57 

4 

f 


Middle 60Z 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




' 11.07 
-1.83 

2-. 78 


21.88 
-3.95 

1 

2.51 


*\ 

25. *lr 
-4.28 

2.75 • 


.Lover 2<K 1 

of Distribution 


Total Error 

Bias ! 

Standard Deviation 
of Difference 




90.40 
-9.42 

l'.27 


27.35 
-4.70 

2.28 


58-. 76 
-6.90 

3.35 



7 

37 




o ■ * ■, 

Table. 8 

f s Summary of Discrepancy Indices for Equating Methods 

PSAT/NMSQT Form 2 Verbal Section - SAT Second Old Form 
Total Score Distribution and Three Subdivisions 



Equating Methods 







IRT 


Linea/ 


Equ^percentile 


Frequency 
Estimation 
Equipercentile 


Total Score 


Scalded Score 
Mean 

Scaled Score 
Standard Deviation 


414 .60 
101.22 


417.25 
102.97 


4^.54 

/ 

/lOl .£6 

— L — 


418.30 
102.56 


Total Score 
Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




22.77 
2.66 

3.96 / 

s— f 


/ 17.16 
/ 1.95 

/ 

3.66 


23;95 
3.71 

3.19 


Upper 20% 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




42.14 
3.29 

5.60 


8.15 
-1.09, 

2.64 

4 


35.08 
4.26 

*.ll 


Middle 60Z 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




21.07 
4.01 

2.22 


18.46 
4.17 

1.05 


— 

26.20 

4.89 
1.52 


Lowtr 20% 

of Distribution 


Total Error 
Bias 

Standard Deviation 
of Difference 




8.74 
-2.44 

1.67 


21.99 
-2.27 

4.11 


5.26 
-.,74 

2.17 
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method is again the equipercentile. 

Further insight into the differential effects of the equating methods on 

the raw to scale score transformations can be gained by examination of t^e 

discrepancy indices computed for the total score distribution arid for thrfee 

segments of this distribution. It should be re-emphasized at this point that 

the traditional equating methods are being assessed in terms of their agreement 

with the IRT method. Therefore the tern "total error" should be thought of as a 
t 

measure of agreement, tjot necessarily as a measure of error. 

Table 5 presents 'summary data and discrepancy indices for the PSAT/NMSQT 
Form 1 Math, SAT First Old Form equating. Examination of h th^e data indicates 
that the IRT method yielded a slightly lower estimate of the mean than the 
linear and equi£ercent;ile method and a slightly higher estimate than the *~ " 

frequency estimation equipercentile method. The IRT method produced slightly 
smaller estimates of the standard deviation than any of the traditional 
methods. Examination of the discrepancy indices for the total score distribution 
and for the 'three segments of this distribution indicates that most of the 
discrepancy between the IRT and the linear method occurs at the extremes of 
the score distribution; i.e., the IRT method agrees much betterwith both of 
the curvilinear methods (e<ra}percfen$ilg* and'HErequency Estimation 'equipercentile) 
at the upper 20Z of the /distribution than it does with the linear method. 

The discrepancy inJ^ices and summary informatipn for the PSAT/NMSQT Form 1 
Math, SAT Second Old Form pairing are given in Table 6. The data indicate 
that, for the total score distribution, the linear equating raethod^yielded the 
■ost discrepant results whfcn compared to the IRT method. This discrepancy can 
be attributed mostly to disagreement at the upper extreme and middle portion 
of the score distribution. The IRT method agrees very well with the equiper- 
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centile method for &he upper 20Z of the. distribution and shows even better 
agreement with the frequency estimation equipercentile procedure for the lower m 
20Z. The IRT method yielded a slightly lower estimateof the mean and smaller 
estimate of the standard deviant* when compareWo . the other three equating 
meth6ds. . 

Table 7 contains . information pertaining to the discrepancy indices and 
summary statistic^ computed f or \he PSAT/NMSQf Form 2 Verbal, SAT First Old 
Form equating. Examination of ± he data indicates chat the IRT method produced 
a slightly higher estimate of the mean than the three traditional methods^nd' 
a slightly smaller estimate of the standard deviation for all the methods, 
except the equipercentile procedure. As was- the case with the previous 
equatings, the linear method appears to, be the most discrepant, ft is inter- 
esting to note that in this case, although they provide more agreement with 
the IRT method than the-linear method, both curvilinear results are quite 
discrepant from the IRT results at the extremes- of the distribution. . 

The results 0 f the discrepancy index^ctjipu tat ions and summary statistics 
for the PSAT/NMSQT Fora 2 Verbal, S4T Second Old Form equating are presented 
in Table 8. For this equating, the IRT method -produced slightly lower estimates 
o£ N the mean and smaller estimates of »the standard deviation than any of the 
traditional methods. The linear method appears to be the most discrepant fo^ 
scores in the upper 20Z of the distributi<£H/ The IRT and equipercentile * ^ 
results show close agreement for this segment of the distributiSn as do the ' 
IRT and frequency estimation equipercentile method for t^^e^ower 20Z of the § 
distribution. 

The results of the second part of the study, which investigated the 
feasibility of using IRT to equate the PSAT/NMSQT Form 1 and Form 2 Verbal 
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* c 5 re ^^ii re " cC ^3^ * re presented in Tables 9 and 10 and Figure- 8,. Table 9 
contains the raw to scale conversions obtained from the*direct PSAT/NMSQT 
h l Fori 2 to Form 1 equating compared to each of the four .previous equatings 
• performed for the PSAT/NMSQT Form 2 Verbal, SAT First Old Form pairing. It 
' ^should be noted at this point that the calibration design permits the PSAT/NMSQT 
form to form equating to be carried out in several different ways; e.g^, Form 
1 could hive been equated to Form 2 and both tests pdaced on scale through the 
Form 2,, SAT Second Old Form relationship, The direction of equating used in 
the study ms chosen to minimize the amount of linear interpolation involved, 
thus reducing the possibility of the interpolation process contributing to 
erro # r i|iich might confound the results. The column labeled IRT(2A) in Table 9 
contains raw to scale conversions that are the result 8f placing PSAT/NMSQT 
Form 2 Verbal scores on the SAT First J)ld Form scale after equating Form 2 of 
^_^he PSAT/NMSQT to Form 1. Figure 8 depicts the information given in Table 9 
graphically. Each of the four previously performed equatings -are compared to 
the PSAT/NMSQT direct form to form equating. Table JO contains discrepancy 
indices computed from a comparison of each of the four previously performed 
equatings with the direct form to form equating. 

Examination of the information contained in Table 9 and illustrated in 
Figure 8, shows very close agreement "between the IRT results obtained from 
equating the PSAT/NMSQT Form 2 Verbal scores to the SAT First Old Form (labeled 
IRT) and those obtained from the direct equating of the test to_the PSAT/NMSQT 
Form 1 Verbal scores (labeled IRT(2A)). 

Table 10 contains discrepancy ind*x information comparing the lfcT(2A) 
results with those obtained from the three traditional equatings and the IRT 
equating. The data indicates that phe IRT(2A) equating results tended to 
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■ Tabl<^*9 

* 

A COMPA'ISCN OF RAW SCOR 6 TO SCALED SCURE TRANSFORMATIONS 
PSAT/NMSOT F H f? M 2 VERBAL Tu SAT FIRST OLO FORM 



EST I MA TEO SCALEO SCORE 
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70 
72 
5h 
65 ' 
33 



L INfclR 

729. 56 
7 20'. 95 
712.34 
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614.43 
604. 15 
59 3.>72 
533.45 
5 76.06 
. 568.6" 
560.^6 
550. 22 
53o. 63 
527.97 
- 516.31 
506. 24 
497.23 
433.5? 
480.62 
474.40 
468. I 7 
461 .94 
452.45 
'♦42.0 7 
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EOUI* 


IRT 
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769*7 


7 7fY 7 1 


741.4 


7 5 6 . ft 


7 5 7 QZ^ 


724. 1 


744.9 
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734 0 


7 7 5 a c 
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72 3 . 5 


7 9 A OA 
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7 1 A Q Q 
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693 .2 


r ft 04 3 ft 
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A ft 4 11 
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ft ft 7 7 7 
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653.1 


ft 5 7 5ft 
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A 47 7 7 
□ t j • j f 


" 627.4 


632 .9 


ft 7 7 17 


,617.1 * 


ft ? ? ft 

0 c c * 0 


A. *5 *5 Q 0 
O C t • 0 7 


603.0 


612.6 


ft 1 ? ft A 


595.4 ' 


602.4 




-585.3 


592 .3 


5 Q 0 P ft 


578.9' 


582 .3 


tap * 1 a 

Jot • [ O 


569. 5 


* 572.3 


5 7? 17 


5S7.2 


56? -5 


5 A ? J 7 r 


544.6 


552 . 7 


^5? *? ft ' 


53 3.0 


543 .1 


5 4? ft*> 


531.9 


S3 3 5 


5 7 P ft ft 
j V • no 


522. 5 


524. 1 


*5?7 '7A 


5U.6 
^502.1 


514.8 


5 17 ft Q 


50^'.6 


5 04 5 7 


494*. 6 


V36 .6 
</487.6 


4Q 5 7 A 


486.5 * 


4 ^ft ^ ft 


* 476.8 


478.8 


4 77 7 5 


467.9 


470 . 1 


4ft A 5 4 


459. 1 


1 461 . 5 


4 59 ft 7 * 


451.5 


4 53 . 0 4 


4 51 7 7 


443.5 


444.6 




434.6 


436.3 


4 74 ftp 


. 425.5 


428 . 1 


4? ft 4 7 


416.1 


419 . J 9 


4 18.32 


408.3 


41L9 


• 410.29 


401.2 


403.9 


402.33 


392.6 


395 .9 


394.40 


1*3. 7 


388.0 


336.49 


372 .7 


380.0 


^ 378.60 


364. 8 


372 .1 


3?0.70, 


358.2 


364.2 


362. 79 


340.6 


356.2 


' 354.86 


339. 1- 


348.3 


346.90 


330. 1 ' 


340.3 


338. 92 


3.24.2 


332.3 


330.90 



42 



- 40.- 



Table 9 (cont.) - f * 

A CfMPAPiSJN Of PAW SCORE TO SCALED SCORE TRANSFORMATIONS 

PSAT/NMSOT F.IRM 2 VERBAL TO SAT FIRST OLO f OR*M (CON'T) 



ESTIMATEO SCALED SCORE 



KAW 

COPt 



14 

13 
I? 
11 
10 
9 
3 
7 
6 
5 
•4 
3 
2 
I 
0 

- 1 

- 1 

-3 
-4 





— LlNtA* 


FOUI-*' 




— — ■«— — — — — — 




56 


3 I £.40 


313.62 




U7. 79 


309. 12 


42 


i99. IS 


300. *8 


1 


290.53 ' 


- 293.35 


i 5 


w 28 L . 97 


287.23 




273.36 


2R0.42 


31 


26*. 75 


271.93 


27 


25-'.,. 15 


2:>3. 5* 


25 


^ 2*7.5* 


255.23 


n 


23d. 93 


2*7.21 


17 


230. 12 


239. lo 


14 


2? 1 . 72 


229.86 


12 


213. 1 I 


21 7. 5* 




20*. 50 ' 


20<{. 97 


2 


' 19S. *9 - 


19°. 20 




137. 2o . 


192.30 


3 


178. 68 


18 5.79 




170.3 7 - 


179.68 




161. *6 


. 173.80 


) 


IS?. If, 


7. 9? 


2 


I**. 25 


153.33 




• 





FREO EST 
EOUI? 



318. t 
308.3 
298. 8 
289. 9 
283.-T 
276.4 
265.7 
257. 7 
250.9 
2*5. * 
238. 2 
226.9 
213.* 
.212.3- 
208. 7 
202.3 
IO0.7 
130.9 
177. I 
170.* 
161.0 



IRT 



324.2 

316. i 

308.0 

299.9 

291 .7 

283.* 

275. I 

266.8 

258.3 

2*9.8 

2*1 .2 

232.5 

223.7 

21*. 8 

205.7 

136.5 

187.2 

177.5 

167.6 

I 57.0 

1*5 .1 



IRTI2A) 



322 
3 I* 
306 
298 
290 
282 
2 73 
265 
257 
2*9 
2*0 
23? 
223 
215. 
206. 
197. 
188. 
179. 
169. 
160. 
1*9. 



.85 
► 73 
.67 
.53 
.37 
. I 8 
.9^ 
:68 
.37 
.02 
.62 
.17 
.63 
.02 
.32 
.50 
.5? 
. 3* 
,3o 
06 
32 
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. Table 10 

Summary of Discrepancy Indices for Equating Methods 
PSAT/NMSQT 'Form 2 Verbal - SAT First Old Form 
Total Score Distribution and Three Subdivisions 



Equating Methods 



* 

- 


* 




(2A) 


• 

IRT 


Linear 


Equipercentile 


Frequency 
Estimation 
Equipercentile 


Total Score 


Scaled Score 
M6an 


413 


.80 




414.98 


410.77 


41.0.31 


410.12 


# 


Scaled Score ' 

Standard 

Deviation 


100.46 




100.25 


101.05 


* 99*. 64 


101.21 


TotaJ. Score 
attribution 


Total Error 
Bias 








1.76 - 
1.18 


37.66 
-3.04 


21.23, 
-3.49 


23.23 
-3.68' 


4 


Standard 

Deviation 

of Difference 








'.61 


5.33 


3.01 


" 3.11 


of Distribution * 


lOLdi error 
Bias 




m 




.44 
.35 


92.97 
-6.02 


51.^1 
-6.51 


27.21 
-4.27 




Standard 

ucviai j. u ii 

of ^Differenge 








.56 


7-53 


2.98 


3.00 


Middle 60X 

of Distribution 


Total Error 

Bias * * 

-Standard 

Deviation 

of Difference 




2.28 
1.50 

2.13 


8.30 
-.32 

. 2.86 


12.64 
-2.44 

2.58 


15.74 
-2.78 

2.83 


Lover- 20X 

of Distribution 


Total Error 
Bias 




s 




1.47 
.99 


73.02 
-8.43 


17.71 
-3.71 


v. 

42.41 
-5.91 


* 


Standard 
Deviation ^ 
of Difference 

9 








.70 


1.39 


, 1.98 


> 

2.75 



i 
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Figure 8: Comparison of Raw Score to Scale Score Conversions obtained by Traditional 

and two IRT Methods for PSAT/NMSQT Form 2 Verbal - SAT First Old Form Equating. 
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yield a siightly lower estimate of the scaled score mean than that obtained 
from the IRT equating. As noted previously, both curvilinear methods tended 
'to exhibit considerable discrepancy from the IRT method at the extremes of the 
score distribution. This effect is not quite as pronounced for the IRT(2A) 



method. 



It was noted in an earlier' section of this pap«r that the- calibration 
design used for the PSAT/NMSQT Verbal form toSorm equating permits the 
replication of the IRT equating performed for the PSAtAmSQT Form 2 ifebal, 
SAT Second Old Form pairing. The results of this equating, designated IRT(2tf), 
are presented in Tables 11 and 12 and tigure 9. Tablfe 11 provides raw to 
scale conversions for the four previous equatings carried out for this pairing 
as well as those obtained for the MT(2B) method. Figure 9 contains a single 
graph comparing the raw to scale conversions 'obtained from the IRT and the 
IRT(2B> methods . Table 12 presents the discrepancy indices computed for the 

4 

total score distribution and three segments of the score- distribution for the 
IRT-IRT(2B) comparison only. 

Examination of the data contained in Tables 11 and 12 and Figure 9 
indicates very close agreement between the two IRT methods. It can be seen, 
from examination of the tabeled data, that the IRT(2B) method tended to yield 
a very slightly higher estimate of the scaled score mean and slightly smaller 
estimate of the standard deviation when compared to the IRT method. 

i 

CONCLUSIONS 

The absence of a true criterion against which to compare the equating 
methods used in this study somewhat confounds the interpretation of the 
results. The sfudy assumes that the most appropriate method to use when 
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Tabic 11 



A COMPARISON Df AM SCORE TO SCALSO SCORE TRANSFORATIONS 
PSAT/N-)SQT FORM' 2 VERBAL TO SAT SECOND OLD FORM 



ESTIMATED SCALED SCORE 



RAW 
SCORE 

63 

62 . 

4i 

60 

60 

53 

57 

56 

55 

54 

53 

52 

51 

50 

4$ 

48 

47 

46 

45 

44 
43 
42 
41 
AO 
39 

33 ' 
37 
36 
35 
3>. 
33 
32 
31 
30 
21 
28 
27 
26 
25 
24 
23 
22 
21 
20 
19 
18 
17 
16 
9 
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FkEQ 

2 
J 
I 
I 
3 

I 

5 
2 
7 
9 
5 

3 

-14 
15 
7 

t * 

32 

23 

41 

I 3 

26 

35 

33 

'♦1 

23 

59 

55 

55 

70 

34 

70 

67 

58 

73 

38 

35 

6t 

75 

77 

37 

64 

71 

74 

52 

23 

41 




LI NEAK 



738. 75 
729.93 
721. 13 
712.27 
703.45 
694.62 
635.. 79 
676. 97 
668.14 
659. 32 
650. 4Q 
h<Pl .66 
63?. 34 
624. 1 ) 
615. IS 
6 36 . 36 
"597. 53 
58«.7l ' 
579. 8n 
571.05 
562.23 
553.4D 
544. 5" 
535. 75 
626. 92 
519. 10 
500.27 
500 . 
491.62 
402. 70 
* 71. 97 
465. 14 
456. 31 
447.40 , 
43A.6& 
*29. 8 3 
42 1 . 0 I 
*12.18 
403. 36 
394. 53 
335. 70 
176. 8ft 
368.06 
359. 71 
350. 40 
341. 51 
332. 75 
323. v2 



EOUI? 

• 740.35 
742.64 
735.42 
728.21 
71 7.41- 
70 5.41 
693.41 
63 4. 50 
678.69 
67?. 89 
660.05 
645.39 
632. 30 
619.72 
609. 19 
601.2 1 
593.23 
534.99 
576.46 
567.03 
65^. 79 
548.52 
636. I I 
526.70 
613.31 
51 1.07 
803.84 
405.37 
4 3 8.40 
430.42 
472.24 
463.83 
455.52 
447. I 5 

438. 75 
430. 35 
421.42 
412137 
403. 33 
394.29 
385.38 
378. 00 
370.62 
363, 16 
353.21 
34 3. 26 
334.60 • 
326.91 



FREO EST 



EOUIX 


IRT 


I RT( 28) 


779.4 


765.4 


763.8 7 


773. 1 


.753 . 1 


751.47 


753 .4 


741. 3 


739 . 54 


73"b.9 . 


729.9 


727.97 


714. 1 


718.9 


716. 71 


695.4 


* 70-8.1 


705 . 74 


685. 5 


697.5 


696.01 


679.0 


687. 1 


. 6 B4 • 4 7 


671 . 8 


676.9 


6 74 .09 


660. 8 


• 666.7 


663.82 


651.1 


656 .6 


6 53 • 65 


644. 3 


646.5 


643. 56 


636. 7 


636.4 


633.53 


623.3 


626.3 


6 23*57 


•619.6 


616.2 


6 13. 70 


609.0 


606. 1 


603 • 91 


600.6 


596.1 


594.22 


591 . 5 


586.2 


584. 64 


582 .1 


576 .3 


5 75.17 


572. A . 


666.6 ' 


5 ^5 . 8 I 


563.2 


557 .0 


556, 57 


555.3 


847.5 


54/.45 


547. 0 


638.1 


539.^1 


536. 7 


528.9 


5 29.50 


624. 3 


619.8 


5 20.^6 


8T3.4 


510.7 


5 11.Q) 


506. 1 


501 .0 


503 . 2 I 


499. 0 


493.1 * 


494. 5* 


489. 8 


434.4 


486. CL 


480.2 


475.7 


477.48 


471 . 3 


467.2 


46*. 9* 


464. 7 


453.7 


460, 5 1 


458. 5 


450.3 


* 4 52.06 


448. 9 


441 .3 


443. 61 


439.2 


433 .4 


, 4 35, 16 


429. 8 


- 425.1 


426. 70 


422. I 


416.7 


4 1 a . 2 * 


414. 3 


408.3 


409.73 


404. 7 


399. 9. 


4^1 .?0 


395. 5 


391.5 


392,65 


386.2 


. 383.0 


3e4.07 


378.6 


374.6 


3 75';4rt 


371.5 


366.1 


366.88 


362.8 


357.6 


35*. 28 


353.3 


349.2 .* 


349.70 


342.2 


340. 7 


341.15 


134.2 


3 32 .3 


3*2.63 


326.4 


323.9 


324.15 



47 



- 45 - 



» ' • Table 11 (cont.) - ', 

4 COMPARISON OF PAW SCORE TO SCALEO SCORE TRANSFORMATIONS / 
PSAT/MSQT FO«m 2 VERBAL TO SAT SECOND OLD FORM fCON'T) 



ESTIMATED SCALED SCORE 



RAX 


• 


• 




FREQ EST 




SCCKE 
— ». 


F- EC 
---- 


L PJEAR 


EQUI* 


ECU I X 


I R T 


15 


54 


315.09 
J 306. 27 


31P.63 


316. 7 


3 15 5 


14 


54 


305. 37 


306. 4 




13 


42 


297.44 


295.66 


296. 9 




12 


20 


238.62 


28 8. 74 


290.3 


290 4 


1 1 


3^ 


27.9. 70 


2*1 .31 


232 . 8 


7 fl ? 0 


to 


26 


270. 96 


\2?3. 31 


270.4 


? 71 7 


9 


26 


262.14 


^64.48 


261.9 




3 


22 


253. 11 


2 5s4 .6 7 * 


253 . 1 


2 56.8 


7 




244-. 43 


24?. 50 


246.2 


248.4 


6 


24 


— 235.66 


2">2. 20 


239.8 


239.9 


5 


16 


^26. -il 


220.01 


231.0 


231.4 


4 


LI 


218.01 


210.50 


222. 7 


222.9 


3 




2 39. 1 <? 


2 32. 50 


215.1 


2 14.3. 


2 


4 


200. 35 


1°6. 83 


210.7 


< 205.3 


1 


6 


m. 53 


' 101.26 


205. 6 


197. 3 


0 


3 


162. 70 


135. 64 


192.7 


138~9 


-1 


3 


173.07 • 


180. 14 


177.1 


13 0.4 


-2 


2 


165. )5 


175.25 


I 67.0 


171.9 


-3 


1 


. 166.22 


170. 36 


157.4 


163.4 


-4 


I 


147.41 


1*5.47 


t49. 0 


154. 7 



3 



IPT(2B) 

315.71 
307.31 
298.95 
290.62 
282.30 
274.00 
265.70 
2 57.40 
249.09 
2 40.78 
232.45 
224.1^2 
215. 79 
207.47 
199.14 
190. 80 
132 .45 
1 74.05 
165.55 
I 56 .84 
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RAW SCORE 

PSAT/NMSQT FORM 2 VERBAL TO SAT SECOND OLD FORM 

Figure 9: C<*pj'i«on of Raw Score to Scale Score Conversions obtained by IRT and IRT (2B) Methods 
for PSAT/NMSQT Form 2 Ve^l - SAT Second Old Form ^listings. 
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Table 12 ; 

Sunldary of -Discrepancy Indices for IRT Versus IRT (2B) ' 
PSAT/NMSQT Form 2 Verbal - SAT Second Old Form 
Total Score Distribution and Three Subdivisions 



Total Score Distribution 



Equating 
Method 

4 ' 1 


Scaled Score 
Mean 


Scaled Score 
Standard 
Deviation 


Total 
Error 


Bias 


Standard Deviation 
of the Difference 


IR* (2B) 


'415.37 


100.90 


1.64 


.78 


1.02 


IRT 

, J 


414.60 


101.22 


* 







Upper 20% of Distribution 



Equating, 
Method 


Total 
Error 


Bias 


Standard Deviation 
of the Difference 


IRT Q^fi) 


2.21 


-.50 


1.40 


Middle 60% of Distribution 


Equating 
Method 


Total 
Error 


Bias 


Standard Deviation 
^ef the Difference 


IRT fa) 


1.83 


1.26 


. .50 


Lower 20% of Distribution 


Equating 
Method 


Total - 
Error 


Bias 


Standard Deviation 
of the Difference 


IRT (2B) 


.46 


.49 

L 


.46 
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equating dissimilar tests given to non^rancjomly equivalent groups is. an IRT 

\ 

method employing the three-parameter logistic model. The*e has been substantial 
support, in the literature cite^reviouaily , for this assumption. However, 

conclusions' drawn from the study, based o^ihe ! 'assumption that the IRT method 

V 

is most appropriate, must be considered tentative and subject to verification 



replication was made by equating 



through replication. A modest attempt, at 

the PSAT/NMSQT Form 2 Verbal scores to thi SAT Second Old Form using item 

parameter estimates obtained from the twof different calibration designs. The 

comparability of the results of these tio equatings lends some credence to the 

comparisons made between the IRT and t/raditional methods. 

The most notable aspect of the fesults obtained from the first part of 

/ * 
the study, which was designed principally to compare the IRT method to the 

traditional methods , j*as the marked agreement found among the four procedures. 

It appears that all the methods berform fairly similarly for the major portion 

of the score reporting range, wiith departures occuring mostly at the extremes 

# / 

of the distribution. As expectid, the traditional curvilinear methods agree 
mdre closely with the IRT method than does th£ linear method for these portions 
of the score scale. 

These results run soraewhajt counter to those suggested by previous research. 

n . \ 

Previous research involving tt^e equating of tests at different levels of 
difficulty given to non-randomjly equivalent groups suggests that the three- 
parameter IRT model should wor^c effectively, whereas the traditional methods 
may not (tee Slinde and Linn, |1977; Lord, 1975, 1977). Hence the expected 
results would be that ^ tradi tional methods would not closely coincide with 
y~the IRT results, nor would the\ linear. and traditional equipercenti le be 
expected to coincide, simply because the differences in difficulty of the 
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two forms would force a curvjllinear relationship. The unexpected agreement 
across methods in this study may be partially explained by the fact that the 
distributions of scores on the two tests were quite similar in shape. In 
fact, this was somewhat to be expected, given that the t.ests are constructed 
to be appropriate for the ability levels of the populations they were admin- 
istered to. The use of the word populations is important in clarifying the 
dif feren/es ^SfTween the results of this study and previous research. An 
underlying assumption of the previous research is that the groups taking the 
forms are non-randomly equi/alent groups from the same population. The 
differences in difficulty between the two forms is hence sufficient/ to cause a 
curvilinear relationship and at the same time theoretically necessitate a true 
score (irt) equating method (see Lord, 1980). In t^his study, while the forms 
do essentially differ in difficulty, they are constructed to yield the same 
sort of distribution for the two non-randomly equivalent groups. Thus, the 
linear ^no* curvilinear methods closely coincide, although the same theoretical 
argument (Lord, 1980) pertaining to the equating of raw scores on tests of 
unequal difficulty, would suggest use of a true score or IRT equating method. 
^ ^ One can only conclude, from the results of the first part of the study, ' 

that the PSAT/NMSQT, SAT equating is essentially linear, at leaiH^ the 
middle portion of the score reporting range. The fact that th\ linear method 
differ! from the curvilinear methods at the extremes of the score distribution 
is evidence that this method, although a good approximation to the curvilinear 
methods, is not quite appropriate for extreme scores. N 

Although the traditional curvilinear methods agree^ more closely with the 
IRT method than did the linear procedures at the extremes of the score scale, 
some discrepancy is apparent. These discrepancies are most probably due to 
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the fact that. the stability of the traditional methods is 'effected by the 
scarcity of data at those extremes. Because it is possible to determine the 
relationship of true scores on two forms of a test for any given 8 , regardless 
of whether it is actually observed, IRT methods are not ^e^ected by a lack of 
data at the upper end of the distribution. This is not true, however,- for 



below chance score level conversions. As explained previously, the tftree- 
parameter logistic model does not provide this relationship directly and some 
method of inferring it must be developed. Therefore* it is difficult to 
arrive at conclusions indicating that any of the equating methods evaluated 
provide more ajjgj^riate transformations for scores at the extreme low ends of' 
the score scale. 

The results of the second part of the study, which investigated the * 
feasibility of- using IRT methods to equate the two forms of the PSAT/NMSQT 

V ' 

directly wfcce encouraging. Very little difference was found between the 
scaled scores obtained from the direct form to form IRT equating an\jhose % 
obtained from the IRT equating of the PSAT/NMSQT Form 2 Verbal scores to the * 
SAT First (fid Form. This offers some support for th6 feasibility cjf using IRT 
methods for form to form ^equating of the PSAT/NMSQT^ * ^ 1 

The fact that the form to form equating appears feasible is important 
for the following reason, when the two forms of the test are equated separately 
to the same old SAT form, it is seldom the case that the maximum raw scores on 
the two forms will be transformed to the same scaled scores. Th^s situation 
could potentially cause some unfairness to candidates taking the form 'of the 
"test which yields a lower maximum raw score^ac aled score conversion. Typically, 
•cores in the upper 'region of one of the forms ar^ adjusted aligfrtl/such that 
maxiaum raw scores on the two forms are transformed ~to the same scaled sc#re. 

• •• ■ 54 
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* # 

This adjustment introduces an unknown degree of error into the equating of the 
scores at the upper end of the score scale. However, if the two forms are 
equated directly using the ca^Jration design described in the second part of 
this study and then placed on scale through their relationship to the same -old 
.SAT form, the maximum raw score on both forms will convert to the same scaled 
^score", thus eliminating the necessity of^an adjustment to scores in the upper : 
end^of the score range. V 

To, summarize, results of the study indicate ttgt traditional linear, * 
.equipercentiU, frequency estimation equipercentile, and IRT equating using 
the three-parameter logistic model, provide comparable results for the major" 
portion of the score reporting range even though non-^^llel tests given to 
non-randomly equivalent groups were equated., Wh^re the methods fail to 
coincide (at^the upper end of the distrifrution).ljfche IRT mathod is assumed to 
'provide the most appropriate conversions, la aJfttion, a unique application 
of IRT methods, that of equating non-parallel tests given to non r randomly 
equivalent groups in the absence of a set of common items or anchor test, has 

b€5r*^shown to be feasible. > * 
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